Similarity Identification of Large-scale Biomedical Documents using Cosine Similarity and Parallel Computing
نویسندگان
چکیده
Document similarity computation is an important research topic in information retrieval, and it a crucial issue for automatic document categorization. The value between 0 1, then the closest to 1 represented both documents considered more relevant, vice versa. However, large scale of textual has created problem finding relevance level documents. Therefore, mesh heading text PubMed higher than abstract Furthermore, parallel computing implemented speed up large-scale identification process that automatically calculates application. execution time 15.447 seconds, timely 74.191 seconds. because contains words heading. This study successfully identified biomedical cosine algorithm. result shown texts form graph table useful measure based on TF*IDF calculation result.
منابع مشابه
Textual Spatial Cosine Similarity
When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in the real time feeding of content as in enterprisewide search environments. To address these real-time constraints, we developed a new measure of document simil...
متن کاملSemantic Cosine Similarity
Cosine similarity is a widely implemented metric in information retrieval and related studies. This metric models a text as a vector of terms and the similarity between two texts is derived from cosine value between two texts' term vectors. Cosine similarity however still can't handle the semantic meaning of the text perfectly. This paper proposes an enhancement of cosine similarity measurement...
متن کاملTranscript Segmentation Using Utterance Cosine Similarity Measure
One of the problems addressed by the Tracker project is the extraction of the key issues discussed at meetings through the analysis of transcripts. Whilst the task of topic extraction is an easy task for humans it has proven difficult task to automate given the unstructured nature of our transcripts. This paper proposes a new approach to transcript segmentation based on the Utterance Cosine Sim...
متن کاملComparison Clustering using Cosine and Fuzzy set based Similarity Measures of Text Documents
Keeping in consideration the high demand for clustering, this paper focuses on understanding and implementing K-means clustering using two different similarity measures. We have tried to cluster the documents using two different measures rather than clustering it with Euclidean distance. Also a comparison is drawn based on accuracy of clustering between fuzzy and cosine similarity measure. The ...
متن کاملLarge Scale Online Learning of Image Similarity Large Scale Online Learning of Image Similarity Through Ranking
Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given ob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge engineering and data science
سال: 2022
ISSN: ['2597-4602', '2597-4637']
DOI: https://doi.org/10.17977/um018v4i22021p105-116